Goto

Collaborating Authors

 conditional density


Generative and Nonparametric Approaches for Conditional Distribution Estimation: Methods, Perspectives, and Comparative Evaluations

Chin, Yen-Shiu, Jou, Zhi-Yu, Morimoto, Toshinari, Wang, Chia-Tse, Chang, Ming-Chung, Yen, Tso-Jung, Huang, Su-Yun, Hsing, Tailen

arXiv.org Machine Learning

The inference of conditional distributions is a fundamental problem in statistics, essential for prediction, uncertainty quantification, and probabilistic modeling. A wide range of methodologies have been developed for this task. This article reviews and compares several representative approaches spanning classical nonparametric methods and modern generative models. We begin with the single-index method of Hall and Yao (2005), which estimates the conditional distribution through a dimension-reducing index and nonparametric smoothing of the resulting one-dimensional cumulative conditional distribution function. We then examine the basis-expansion approaches, including FlexCode (Izbicki and Lee, 2017) and DeepCDE (Dalmasso et al., 2020), which convert conditional density estimation into a set of nonparametric regression problems. In addition, we discuss two recent generative simulation-based methods that leverage modern deep generative architectures: the generative conditional distribution sampler (Zhou et al., 2023) and the conditional denoising diffusion probabilistic model (Fu et al., 2024; Yang et al., 2025). A systematic numerical comparison of these approaches is provided using a unified evaluation framework that ensures fairness and reproducibility. The performance metrics used for the estimated conditional distribution include the mean-squared errors of conditional mean and standard deviation, as well as the Wasserstein distance. We also discuss their flexibility and computational costs, highlighting the distinct advantages and limitations of each approach.


Factorizable joint shift revisited

Tasche, Dirk

arXiv.org Machine Learning

Such failure can be caused by distribution shift (also known as dataset shift) between the training and test datasets. For this reason, distribution shift and domain adaptation (a notion comprising techniques for tackling distribution shift) has been a major research topic in machine learning for some time. This paper takes the perspective of Kouw and Loog (2021) and studies the case where feature observations from the test dataset are available for analysis but observations of labels are missing. Under these circumstances, without any assumptions on the nature of the distribution shift between the training and test datasets meaningful prediction of the labels in the test dataset or of their distribution is not feasible. See Kouw and Loog (2021) for a survey of approaches to domain adaptation and their related assumptions. Arguably, covariate shift (also known as population drift) and label shift (also known as prior probability shift or target shift) are the most popular specific distribution shift assumptions, both for their intuiveness as well as their computational manageability. However, exclusive covariate and label shift assumptions have been criticised for being insufficient for common domain adaptation tasks (e.g.



Schrodinger Neural Network and Uncertainty Quantification: Quantum Machine

Hammad, M. M.

arXiv.org Artificial Intelligence

We introduce the Schrodinger Neural Network (SNN), a principled architecture for conditional density estimation and uncertainty quantification inspired by quantum mechanics. The SNN maps each input to a normalized wave function on the output domain and computes predictive probabilities via the Born rule. The SNN departs from standard parametric likelihood heads by learning complex coefficients of a spectral expansion (e . g ., Chebyshev polynomials) whose squared modulus yields the conditional density $p(y|x)=\left| ψ_x(y)\right| {}^2$ with analytic normalization. This representation confers three practical advantages: positivity and exact normalization by construction, native multimodality through interference among basis modes without explicit mixture bookkeeping, and yields closed-form (or efficiently computable) functionals$-$such as moments and several calibration diagnostics$-$as quadratic forms in coefficient space. We develop the statistical and computational foundations of the SNN, including (i) training by exact maximum-likelihood with unit-sphere coefficient parameterization, (ii) physics-inspired quadratic regularizers (kinetic and potential energies) motivated by uncertainty relations between localization and spectral complexity, (iii) scalable low-rank and separable extensions for multivariate outputs, (iv) operator-based extensions that represent observables, constraints, and weak labels as self-adjoint matrices acting on the amplitude space, and (v) a comprehensive framework for evaluating multimodal predictions. The SNN provides a coherent, tractable framework to elevate probabilistic prediction from point estimates to physically inspired amplitude-based distributions.


Asymptotic Expansion for Nonlinear Filtering in the Small System Noise Regime

Kurisaki, Masahiro

arXiv.org Machine Learning

We propose a new asymptotic expansion method for nonlinear filtering, based on a small parameter in the system noise. The conditional expectation is expanded as a power series in the noise level, with each coefficient computed by solving a system of ordinary differential equations. This approach mitigates the trade-off between computational efficiency and accuracy inherent in existing methods such as Gaussian approximations and particle filters. Moreover, by incorporating an Edgeworth-type expansion, our method captures complex features of the conditional distribution, such as multimodality, with significantly lower computational cost than conventional filtering algorithms.


Flexible Selective Inference with Flow-based Transport Maps

Liu, Sifan, Panigrahi, Snigdha

arXiv.org Machine Learning

Data-carving methods perform selective inference by conditioning the distribution of data on the observed selection event. However, existing data-carving approaches typically require an analytically tractable characterization of the selection event. This paper introduces a new method that leverages tools from flow-based generative modeling to approximate a potentially complex conditional distribution, even when the underlying selection event lacks an analytical description -- take, for example, the data-adaptive tuning of model parameters. The key idea is to learn a transport map that pushes forward a simple reference distribution to the conditional distribution given selection. This map is efficiently learned via a normalizing flow, without imposing any further restrictions on the nature of the selection event. Through extensive numerical experiments on both simulated and real data, we demonstrate that this method enables flexible selective inference by providing: (i) valid p-values and confidence sets for adaptively selected hypotheses and parameters, (ii) a closed-form expression for the conditional density function, enabling likelihood-based and quantile-based inference, and (iii) adjustments for intractable selection steps that can be easily integrated with existing methods designed to account for the tractable steps in a selection procedure involving multiple steps.


Nonparametric Factor Analysis and Beyond

Zheng, Yujia, Liu, Yang, Yao, Jiaxiong, Hu, Yingyao, Zhang, Kun

arXiv.org Machine Learning

Nearly all identifiability results in unsupervised representation learning inspired by, e.g., independent component analysis, factor analysis, and causal representation learning, rely on assumptions of additive independent noise or noiseless regimes. In contrast, we study the more general case where noise can take arbitrary forms, depend on latent variables, and be non-invertibly entangled within a nonlinear function. We propose a general framework for identifying latent variables in the nonparametric noisy settings. We first show that, under suitable conditions, the generative model is identifiable up to certain submanifold indeterminacies even in the presence of non-negligible noise. Furthermore, under the structural or distributional variability conditions, we prove that latent variables of the general nonlinear models are identifiable up to trivial indeterminacies. Based on the proposed theoretical framework, we have also developed corresponding estimation methods and validated them in various synthetic and real-world settings. Interestingly, our estimate of the true GDP growth from alternative measurements suggests more insightful information on the economies than official reports. We expect our framework to provide new insight into how both researchers and practitioners deal with latent variables in real-world scenarios.


Principal Graph Encoder Embedding and Principal Community Detection

Shen, Cencheng, Dong, Yuexiao, Priebe, Carey E., Larson, Jonathan, Trinh, Ha, Park, Youngser

arXiv.org Machine Learning

In this paper, we introduce the concept of principal communities and propose a principal graph encoder embedding method that concurrently detects these communities and achieves vertex embedding. Given a graph adjacency matrix with vertex labels, the method computes a sample community score for each community, ranking them to measure community importance and estimate a set of principal communities. The method then produces a vertex embedding by retaining only the dimensions corresponding to these principal communities. Theoretically, we define the population version of the encoder embedding and the community score based on a random Bernoulli graph distribution. We prove that the population principal graph encoder embedding preserves the conditional density of the vertex labels and that the population community score successfully distinguishes the principal communities. We conduct a variety of simulations to demonstrate the finite-sample accuracy in detecting ground-truth principal communities, as well as the advantages in embedding visualization and subsequent vertex classification. The method is further applied to a set of real-world graphs, showcasing its numerical advantages, including robustness to label noise and computational scalability.


Conformal Inference of Individual Treatment Effects Using Conditional Density Estimates

Wang, Baozhen, Qiao, Xingye

arXiv.org Machine Learning

In an era where diverse and complex data are increasingly accessible, the precise prediction of individual treatment effects (ITE) becomes crucial across fields such as healthcare, economics, and public policy. Current state-of-the-art approaches, while providing valid prediction intervals through Conformal Quantile Regression (CQR) and related techniques, often yield overly conservative prediction intervals. In this work, we introduce a conformal inference approach to ITE using the conditional density of the outcome given the covariates. We leverage the reference distribution technique to efficiently estimate the conditional densities as the score functions under a two-stage conformal ITE framework. We show that our prediction intervals are not only marginally valid but are narrower than existing methods. Experimental results further validate the usefulness of our method.


Deep Transfer $Q$-Learning for Offline Non-Stationary Reinforcement Learning

Chai, Jinhang, Chen, Elynn, Fan, Jianqing

arXiv.org Machine Learning

In dynamic decision-making scenarios across business and healthcare, leveraging sample trajectories from diverse populations can significantly enhance reinforcement learning (RL) performance for specific target populations, especially when sample sizes are limited. While existing transfer learning methods primarily focus on linear regression settings, they lack direct applicability to reinforcement learning algorithms. This paper pioneers the study of transfer learning for dynamic decision scenarios modeled by non-stationary finite-horizon Markov decision processes, utilizing neural networks as powerful function approximators and backward inductive learning. We demonstrate that naive sample pooling strategies, effective in regression settings, fail in Markov decision processes.To address this challenge, we introduce a novel ``re-weighted targeting procedure'' to construct ``transferable RL samples'' and propose ``transfer deep $Q^*$-learning'', enabling neural network approximation with theoretical guarantees. We assume that the reward functions are transferable and deal with both situations in which the transition densities are transferable or nontransferable. Our analytical techniques for transfer learning in neural network approximation and transition density transfers have broader implications, extending to supervised transfer learning with neural networks and domain shift scenarios. Empirical experiments on both synthetic and real datasets corroborate the advantages of our method, showcasing its potential for improving decision-making through strategically constructing transferable RL samples in non-stationary reinforcement learning contexts.